Automatic Multi-Document Arabic Text Summarization Using Clustering and Keyphrase Extraction
نویسندگان
چکیده
Automatic text summarization has become important due to the rapid growth of information texts since it is very difficult for human beings to manually summarize large documents of texts. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. Arabic natural language processing lacks tools and resources which are essential to advance research in Arabic text summarization. In addition to the limited resources, there has been little attention and research done in this field. Arabic text summarization still suffer from low accuracy as they use simple summarization techniques. The aim of this research is to improve Arabic text summarization by using clustering and keyphrase extraction. This study proposes a combined clustering method to group Arabic documents into several clusters. Keyphrase extraction module is applied to extract important keyphrases from each cluster, which helps to identify the most important sentences and find similar sentences based on several similarity algorithms. These algorithms are applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) metrics were used for the evaluation. For the summarization dataset the corpus DUC2002 was used. This model achieved an accuracy of 43.4%. The experiments have proved that the proposed model has given better performance in comparison to other work.
منابع مشابه
Single Document Keyphrase Extraction Using Label Information
Keyphrases have found wide ranging application in NLP and IR tasks such as document summarization, indexing, labeling, clustering and classification. In this paper we pose the problem of extracting label specific keyphrases from a document which has document level metadata associated with it namely labels or tags (i.e. multi-labeled document). Unlike other, supervised or unsupervised, methods f...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملKeyphrase Extraction and Grouping Based on Association Rules
Keyphrases are important in capturing the content of a document and thus useful for many natural language processing tasks such as Information Retrieval, Document Classification, and Text Summarization. Keyphrase extraction aims to identify multi-word sequences from a collection of documents that more or less correspond to keyphrases. In this paper, we propose a new method for keyphrase extract...
متن کاملA Hybrid Approach to Extract Keyphrases from Medical Documents
Keyphrases are the phrases, consisting of one or more words, representing the important concepts in the articles. Keyphrases are useful for a variety of tasks such as text summarization, automatic indexing, clustering/classification, text mining etc. This paper presents a hybrid approach to keyphrase extraction from medical documents. The keyphrase extraction approach presented in this paper is...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015